class: center, middle, inverse, title-slide .title[ # Geospatial Techniques for Social Scientists ] .subtitle[ ## Applied Data Wrangling & Linking ] .author[ ### Stefan Jünger ] .institute[ ###
Methods and Data Hub, Cluster of Excellence “The Politics of Inequality”
] .date[ ### July 15, 2022 ] --- layout: true --- ## Now <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Title </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> July 14 </td> <td style="text-align:left;color: gray !important;"> 09:00am-10:30am </td> <td style="text-align:left;font-weight: bold;"> Introduction to GIS </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> July 14 </td> <td style="text-align:left;color: gray !important;"> 10:30am-12:00pm </td> <td style="text-align:left;font-weight: bold;"> Vector Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> July 14 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:00pm-01:00pm </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> July 14 </td> <td style="text-align:left;color: gray !important;"> 01:00pm-02:30pm </td> <td style="text-align:left;font-weight: bold;"> Mapping </td> </tr> <tr> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> July 14 </td> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> 02:30pm-04:00pm </td> <td style="text-align:left;font-weight: bold;border-bottom: 1px solid"> Raster Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> July 15 </td> <td style="text-align:left;color: gray !important;"> 09:00am-10:00am </td> <td style="text-align:left;font-weight: bold;"> Advanced Data Import & Processing </td> </tr> <tr> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> July 15 </td> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> 10:00am-11:00am </td> <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Applied Data Wrangling & Linking </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> July 15 </td> <td style="text-align:left;color: gray !important;"> 11:00am-12:00pm </td> <td style="text-align:left;font-weight: bold;"> Investigating Spatial Autocorrelation </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> July 15 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:00pm-01:00pm </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> July 15 </td> <td style="text-align:left;color: gray !important;"> 01:00pm-02:45pm </td> <td style="text-align:left;font-weight: bold;"> Spatial Econometrics & Outlook </td> </tr> </tbody> </table> --- ## What Are Georeferenced Data? .pull-left[ </br> Data with a direct spatial reference `\(\rightarrow\)` **geo-coordinates** - Information about geometries - Optional: Content in relation to the geometries ] .pull-right[ <img src="data:image/png;base64,#../img/fig_geometries.png" width="85%" style="display: block; margin: auto;" /> .tinyisher[Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019] ] --- ## Georeferenced Survey Data Survey data enriched with geo-coordinates (or other direct spatial references) </br> <img src="data:image/png;base64,#../img/geo_surveys.png" width="85%" style="display: block; margin: auto;" /> </br> .center[**With georeferenced survey data, we can analyze interactions between individual behaviors and attitudes and the environment.**] --- ## Prerequisite: Geocoding .pull-left[ Indirect spatial references have to be converted into direct spatial references `\(\rightarrow\)` Addresses to geo-coordinates Different service providers can be used - e.g., Google, Bing, OSM - In Germany: Federal Agency of Cartography and Geodesy (BKG) ] .pull-right[ </br> </br> <img src="data:image/png;base64,#../img/geocoding.png" width="785" style="display: block; margin: auto;" /> ] --- ## Georeferenced Survey Data `\(\neq\)` Geospatial Data .pull-left[ We must not store geo-coordinates and survey data in one dataset - Differences to geospatial data - More complicated workflow to work with (see Challenges) ] .pull-right[ <img src="data:image/png;base64,#../img/fig_linking_workflow_simple.png" width="85%" style="display: block; margin: auto;" /> .right[.tinyisher[Jünger, 2019]] ] --- ## Data Availability .pull-left[ Geospatial Data - Often de-centralized distributed - Fragmented data landscape, at least in Germany Georeferenced Survey Data - Primarily, survey data - Depends on documentation - Access difficult due to data protection restrictions ] .pull-right[ <img src="data:image/png;base64,#../img/data_availability.png" width="75%" style="display: block; margin: auto;" /> .right[.tinyisher[ https://www.eea.europa.eu/data-and-maps https://datasearch.gesis.org/ https://datasetsearch.research.google.com/ ]] ] --- ## Technical Procedures .pull-left[ </br> .center[<img src="./img/angry_cat.gif" width="75%">] .center[.tinyisher[https://giphy.com/gifs/VbnUQpnihPSIgIXuZv]] ] .pull-right[ Geocoding - Reasonable automated procedure - But differ in quality and access rights - High risk for data protection GIS procedures - Requires exploiting specialized software - Can get complex and resource-intensive ] --- ## Data Protection </br> </br> That‘s one of the biggest issues - Explicit spatial references increase the risk of re-identifying anonymized survey respondents - Can occur during the processing of data but also during the analysis </br> .center[**Affects all phases of research and data management!**] --- ## Legal Regulations .pull-left[ Storing personal information such as addresses in the same place as actual survey attributes is not allowed in Germany - Projects keep them in separate locations - Can only be matched with a correspondence table - Necessary to conduct data linking ] .pull-right[ <img src="data:image/png;base64,#../img/fig_linking_workflow_simple.png" width="949" style="display: block; margin: auto;" /> .right[.tinyisher[Jünger, 2019]] ] --- ## Distribution & Re-Identification Risk Data may still be sensitive - Geospatial attributes add new information to existing data - May be part of general data privacy checks, but we may not distribute these data as is .pull-left[ Safe Rooms / Secure Data Centers - Control access - Checks output ] .pull-right[ <img src="data:image/png;base64,#../img/safe_room.png" width="825" style="display: block; margin: auto;" /> .right[.tinyisher[https://www.gesis.org/en/services/processing-and-analyzing-data/guest-research-stays/secure-data-center-sdc]] ] --- ## Spatial Linking <img src="data:image/png;base64,#../img/fig_3d_.png" width="45%" style="display: block; margin: auto;" /> .tinyisher[Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), Leibniz Institute of Ecological Urban and Regional Development (2018), Statistical Offices of the Federation and the Länder (2016), and German Environmental Agency / EIONET Central Data Repository (2016) / Jünger, 2019] --- ## Spatial Linking Methods (Examples) I .pull-left[ 1:1 <img src="data:image/png;base64,#../img/fig_linking_by_location_noise.png" width="75%" style="display: block; margin: auto;" /> ] .pull-right[ Distances <img src="data:image/png;base64,#../img/fig_linking_distance_noise_appI.png" width="75%" style="display: block; margin: auto;" /> ] .tinyisher[Sources: German Environmental Agency / EIONET Central Data Repository (2016) and OpenStreetMap / GEOFABRIK (2018) / Jünger, 2019] --- ## Spatial Linking Methods (Examples) II .pull-left[ Filter methods <img src="data:image/png;base64,#../img/fig_linking_focal_immigrants.png" width="75%" style="display: block; margin: auto;" /> ] .pull-right[ Buffer zones <img src="data:image/png;base64,#../img/fig_linking_buffer_sealing.png" width="75%" style="display: block; margin: auto;" /> ] .tinyisher[Sources: Leibniz Institute of Ecological Urban and Regional Development (2018) and Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019] --- ## Environmental inequalities (Jünger, 2021) > Is income associated with fewer environmental disadvantages, and are there differences between German people and people with a migration background? .pull-left[ .small[ Theoretical Framework - Social and Ethnic Inequalities (Crowder & Downey, 2010) - Place Stratification (Lersch, 2013) Data - GGSS 2016 & 2018 - soil sealing & green spaces ] ] .pull-right[ <img src="data:image/png;base64,#../img/fig_linking_buffer_sealing.png" width="65%" style="display: block; margin: auto;" /> .tinyisher[Leibniz Institute of Ecological Urban and Regional Development (2018) / Jünger, 2019] ] --- ## Results <img src="data:image/png;base64,#../img/FIGURE_2.png" width="70%" style="display: block; margin: auto;" /> .tinyisher[Data source: GGSS 2016 & 2018; N = 6,117; 95% confidence intervals based on cluster-robust standard errors (sample point); all models control for age, gender, education, household size, german region and survey year interaction, inhabitant size of the municipality, and distance to municipality administration] --- ## Attitudes towards minorities (Jünger & Schaeffer, 2022) > Do people who live in ethnic homogenous neighborhoods that are close to ethnic diverse ones have more negative attitudes towards minorities? .pull-left[ .small[ Theoretical Framework - Contact Theory (Allport, 1954) - Ethnic Competition (Stephan et al., 2009) Data - GGSS 2016 - German Census 2011 ] ] .pull-right[ <img src="data:image/png;base64,#../img/Abb1.png" width="65%" style="display: block; margin: auto;" /> .tinyisher[German Census 2011, OpenStreetMap / Jünger & Schaeffer, 2022] ] --- ## Results <img src="data:image/png;base64,#../img/Abb2.png" width="70%" style="display: block; margin: auto;" /> .tinyisher[Data source: GGSS 2016; N = 1,689; 95% confidence intervals based on cluster-robust standard errors (sample point); all models control for age, gender, education, income, unemployment, homeownership, immigrants and inhabitants in the neighborhood, inhabitant size of the municipality, german region] --- class: middle ## Addon-slides: Simulated case study --- ## Fake Research Question .pull-left[ Say we're interested in the impact of the current pandemic on individual well-being in the geographic context. We plan to conduct a survey in the state of North-Rhine Westphalia and Baden Wurttemberg. ] .pull-right[ </br> <img src="data:image/png;base64,#../img/4iq3kg.jpg" width="813" style="display: block; margin: auto;" /> .center[.tinyisher[https://imgflip.com/memegenerator/Trump-Bill-Signing] ] ] --- ## Our Sample Area: NRW's and BW's Boundaries .pull-left[ ```r nrw <- osmdata::getbb( "Nordrhein-Westfalen", format_out = "sf_polygon" ) %>% .$multipolygon %>% sf::st_transform(3035) bw <- osmdata::getbb( "Baden-Württemberg", format_out = "sf_polygon" ) %>% .$multipolygon %>% sf::st_transform(3035) sampling_area <- dplyr::bind_rows( nrw %>% dplyr::mutate(state = "nrw"), bw %>% dplyr::mutate(state = "bw") ) ``` ] -- .pull-right[ ```r tm_shape(sampling_area) + tm_borders() ``` <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/nrw-map-1.png" style="display: block; margin: auto;" /> ] --- ## A Fake-Life Application .pull-left[ Let's sample 1,000 people to interview them about their lives. We can draw a fake sample this way and also add an identifier for the respondents: ```r set.seed(1234) ``` ```r fake_coordinates <- sf::st_sample(sampling_area, 1000) %>% sf::st_sf() %>% dplyr::mutate( id_2 = stringi::stri_rand_strings(10000, 10) %>% sample(1000, replace = FALSE) ) ``` ] -- .pull-right[ ```r tm_shape(sampling_area) + tm_borders() + tm_shape(fake_coordinates) + tm_dots() ``` <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/map-osm-coordinates-1.png" style="display: block; margin: auto;" /> ] --- ## Correspondence Table As in any survey that deals with addresses, we need a correspondence table of the distinct identifiers. ```r correspondence_table <- dplyr::bind_cols( id = stringi::stri_rand_strings(10000, 10) %>% sample(1000, replace = FALSE), id_2 = fake_coordinates$id_2 ) correspondence_table ``` ``` ## # A tibble: 1,000 × 2 ## id id_2 ## <chr> <chr> ## 1 ZB7Z1ZZeXP O7fTi57jEz ## 2 yJvlYtinvl GR3DU6RNwC ## 3 eFfTgwO7q9 HIROndK7fZ ## 4 lKkUPbCnkQ r0II3cKoc6 ## 5 HzCinC6PAv fffsx5Ka5Q ## 6 K2MFPDBd2Z q46L4y6sHb ## 7 31XssNjrKi vEMkSbk41Y ## 8 2mC4nCZ2xL QbFp2Km7CN ## 9 reRcF9A6jY xPovR3CEG5 ## 10 tk2G0MsTU2 YFL4QpbWjx ## # … with 990 more rows ``` --- ## Conduct the Survey We ask respondents for some standard sociodemographics. But we also apply a new and highly innovative item score, called the Fake Corona Burden Score (FCBS) using the [`faux` package](https://cran.r-project.org/web/packages/faux/index.html). ```r fake_survey_data <- dplyr::bind_cols( id = correspondence_table$id, age = sample(18:100, 1000, replace = TRUE), gender = sample(1:2, 1000, replace = TRUE) %>% as.factor(), education = sample(1:4, 1000, replace = TRUE) %>% as.factor(), income = sample(100:10000, 1000, replace = TRUE), fcbs = secret_variable_we_are_hiding_from_you ) ``` --- ## Survey Data Structure ```r fake_survey_data ``` ``` ## # A tibble: 1,000 × 6 ## id age gender education income fcbs ## <chr> <int> <fct> <fct> <int> <dbl> ## 1 ZB7Z1ZZeXP 83 1 2 246 44.9 ## 2 yJvlYtinvl 38 2 3 1985 42.0 ## 3 eFfTgwO7q9 30 1 4 8382 58.1 ## 4 lKkUPbCnkQ 99 1 1 8133 53.5 ## 5 HzCinC6PAv 49 2 1 1940 62.1 ## 6 K2MFPDBd2Z 50 2 2 5134 68.2 ## 7 31XssNjrKi 46 1 1 3884 49.2 ## 8 2mC4nCZ2xL 84 1 3 2137 38.8 ## 9 reRcF9A6jY 22 2 4 6441 58.9 ## 10 tk2G0MsTU2 42 1 1 4547 56.7 ## # … with 990 more rows ``` --- ## What could explain our Fake Corona Burden Score? *Likelihood to meet people* > 1) The lower the district's population density, the lower the Fake Corona Burden Score. -- *District relatively high affected by Covid-19* > 2) If the Corona deaths in the district is higher than in the state, the lower the Fake Corona Burden Score. --- ## What could explain our Fake Corona Burden Score? *Provision of health services* > 3) The closer the next hospital to the respondent, the lower the Fake Corona Burden Score. -- *Possible language issues in health care communication* > 3) The higher the immigrant rate in the neighborhood, the higher the Fake Corona Burden Score. --- ## Variable Overview Good Thing: We have all the data to calculate these indicators, even though we need first to transform some of the variables. 1. Calculate the variables at the district and state levels. 2. Link the district and state-level data with the fake survey respondents. 3. Calculate the hospital distance and immigrant rate. <img src="data:image/png;base64,#../img/overview_table.png" width="90%" style="display: block; margin: auto;" /> --- ## Population Density When all data sets are loaded, we can calculate the districts' area for the population density. <small><small>Remember: We reduced our sample to North Rhine Westphalia and dropped some variables.</small></small> .smaller[ ```r # calculate area of districts sf::st_area(sampling_area_districts_enhanced) %>% head(4) ``` ``` ## Units: [m^2] ## [1] 1269653841 1991051810 797826703 694472630 ``` ```r # areas will always be calculated # in units according to the CRS sampling_area_districts_enhanced %>% dplyr::mutate(area = sf::st_area(.)) %>% dplyr::select(area) ``` ``` ## Simple feature collection with 138 features and 1 field ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 4031313 ymin: 2684076 xmax: 4397808 ymax: 3337853 ## Projected CRS: ETRS89-extended / LAEA Europe ## # A tibble: 138 × 2 ## area geometry ## [m^2] <MULTIPOLYGON [m]> ## 1 1269653841. (((4323993 3203598, 4324507 3202914, 4324737 3202687, 4324920 3202646, 4325091 32023... ## 2 1991051810. (((4238609 3327711, 4238619 3327497, 4238662 3327236, 4238715 3327104, 4238688 33269... ## 3 797826703. (((4284723 3239656, 4284886 3239651, 4285048 3239787, 4285086 3239475, 4284857 32393... ## 4 694472630. (((4291874 3214346, 4291971 3214252, 4291991 3213730, 4292086 3213449, 4292213 32133... ## 5 1400950815. (((4265129 3310624, 4265143 3310552, 4265158 3310475, 4265181 3310400, 4265236 33102... ## 6 676259852. (((4272308 3259207, 4272316 3259144, 4272325 3259106, 4272330 3259076, 4272341 32590... ## 7 120191058. (((4186493 3247280, 4186547 3247144, 4187104 3247174, 4187249 3247185, 4187535 32472... ## 8 2883718972. (((4138763 3337307, 4138937 3337295, 4138977 3337314, 4138997 3337356, 4139001 33373... ## 9 982075046. (((4103569 3287402, 4103654 3287386, 4103715 3287396, 4103775 3287416, 4103848 32874... ## 10 2121108106. (((4177662 3290021, 4177688 3290019, 4177720 3290024, 4177783 3290037, 4177815 32900... ## # … with 128 more rows ``` ] --- ## Population Density All left to do is a simple mutation. Let's pipe it! .pull-left[ ```r # calculation population density sampling_area_districts_enhanced <- sampling_area_districts_enhanced %>% # calculate area dplyr::mutate(area = sf::st_area(.)) %>% # change unit to square kilometers dplyr::mutate(area_km2 = units::set_units (area, km^2)) %>% # recode variable as numeric dplyr::mutate(area_km2 = as.numeric (area_km2)) %>% # calculate population density dplyr::mutate(pop_dens = population/ area_km2) ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] --- ## Aggregate Covid-19 Deaths In order to calculate the number of Covid-19 deaths on the state level, we need to aggregate these numbers. .pull-left[ ```r # check data sets sampling_area_districts_enhanced ``` ``` ## Simple feature collection with 138 features and 9 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 4031313 ymin: 2684076 xmax: 4397808 ymax: 3337853 ## Projected CRS: ETRS89-extended / LAEA Europe ## # A tibble: 138 × 10 ## district_id state geometry population death_rate death7_lk afd_voteshare_2… area area_km2 pop_dens ## * <chr> <chr> <MULTIPOLYGON [m]> <dbl> <dbl> <dbl> <dbl> [m^2] <dbl> <dbl> ## 1 03155 nrw (((4323993 3203598, 4324507 3202914, 4324737 3202687, 4324… 131772 0.345 0 8.03 1.27e9 1270. 104. ## 2 03251 nrw (((4238609 3327711, 4238619 3327497, 4238662 3327236, 4238… 218072 0.212 0 7.02 1.99e9 1991. 110. ## 3 03252 nrw (((4284723 3239656, 4284886 3239651, 4285048 3239787, 4285… 148580 0.362 1 8.88 7.98e8 798. 186. ## 4 03255 nrw (((4291874 3214346, 4291971 3214252, 4291991 3213730, 4292… 70207 0.657 0 9.04 6.94e8 694. 101. ## 5 03256 nrw (((4265129 3310624, 4265143 3310552, 4265158 3310475, 4265… 121645 0.513 0 8.62 1.40e9 1401. 86.8 ## 6 03257 nrw (((4272308 3259207, 4272316 3259144, 4272325 3259106, 4272… 158406 0.368 0 8.67 6.76e8 676. 234. ## 7 03404 nrw (((4186493 3247280, 4186547 3247144, 4187104 3247174, 4187… 164223 0.270 0 4.45 1.20e8 120. 1366. ## 8 03454 nrw (((4138763 3337307, 4138937 3337295, 4138977 3337314, 4138… 328930 0.238 0 6.09 2.88e9 2884. 114. ## 9 03456 nrw (((4103569 3287402, 4103654 3287386, 4103715 3287396, 4103… 137891 0.312 0 5.16 9.82e8 982. 140. ## 10 03459 nrw (((4177662 3290021, 4177688 3290019, 4177720 3290024, 4177… 359471 0.327 0 6.21 2.12e9 2121. 169. ## # … with 128 more rows ``` ] -- .pull-right[ ```r # check data sets sampling_area ``` ``` ## Simple feature collection with 2 features and 1 field ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 4031317 ymin: 2713678 xmax: 4357500 ymax: 3269987 ## Projected CRS: ETRS89-extended / LAEA Europe ## state geometry ## 1 nrw MULTIPOLYGON (((4031317 311... ## 2 bw MULTIPOLYGON (((4134155 273... ``` ] --- ## Calculate Districts Within States We rely here on the `sf` feature `st_join()` and the `dplyr` functions `group by()`and `summarize()`. First step is a join of two sf objects. You always join based on the first-named object (x lying within which y?). ```r # step one: district in sf::st_join(sampling_area_districts_enhanced, sampling_area, join = st_within) %>% head(.,2) ``` ``` ## Simple feature collection with 2 features and 10 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 4205260 ymin: 3160220 xmax: 4332216 ymax: 3327915 ## Projected CRS: ETRS89-extended / LAEA Europe ## # A tibble: 2 × 11 ## district_id state.x geometry population death_rate death7_lk afd_voteshare_2… area area_km2 pop_dens state.y ## <chr> <chr> <MULTIPOLYGON [m]> <dbl> <dbl> <dbl> <dbl> [m^2] <dbl> <dbl> <chr> ## 1 03155 nrw (((4323993 3203598, 4324507 3202914, 4324737 3202… 131772 0.345 0 8.03 1.27e9 1270. 104. <NA> ## 2 03251 nrw (((4238609 3327711, 4238619 3327497, 4238662 3327… 218072 0.212 0 7.02 1.99e9 1991. 110. <NA> ``` --- ## A Word on Spatial Relations Every time we want to use a spatial join, meaning matching two spatial objects based on their geometric relation, we need to think about the type of relation we assume. `st_join` asks us to define the "join" type (default is intersect). <img src="data:image/png;base64,#../img/spatial_relations.png" width="90%" style="display: block; margin: auto;" /> .center[ <small><small>https://www.e-education.psu.edu/maps/l2_p5.html</small></small> ] --- ## Group and Sum on State-Level Back to our spatial join. After joining the information in which state each district lies (surprise: North Rhine Westphalia!), we can summarize the number of Covid-19 cases by grouping them on the state level. ```r # second step: group & sum sf::st_join( sampling_area_districts_enhanced, sampling_area %>% dplyr::select(-state), join = st_within) %>% dplyr::group_by(state) %>% dplyr::summarize(death_rate_state = mean(death_rate)) %>% head(.,2) ``` ``` ## Simple feature collection with 2 features and 2 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 4031313 ymin: 2684076 xmax: 4397808 ymax: 3337853 ## Projected CRS: ETRS89-extended / LAEA Europe ## # A tibble: 2 × 3 ## state death_rate_state geometry ## <chr> <dbl> <MULTIPOLYGON [m]> ## 1 bw 0.443 (((4208884 2741153, 4208846 2741133, 4208764 2741085, 4208675 2741137, 4208548 27410... ## 2 nrw 0.440 (((4059941 3029719, 4059944 3029560, 4060029 3029491, 4060129 3029490, 4060141 30293... ``` --- # Final Steps In the last step, we drop the geometries, re-join our variable back to the state data frame and calculate our aggregated Covid death rates. And - voilà - a simple function will give us our district differences. .pull-left[ ```r # the 'dplyr way' in completion sampling_area_districts_enhanced <- sf::st_join( sampling_area_districts_enhanced, sampling_area %>% dplyr::select(-state), join = st_within ) %>% dplyr::group_by(state) %>% dplyr::summarize( death_rate_state = mean(death_rate) ) %>% # already have geometries, so drop them sf::st_drop_geometry() %>% # simple left_join dplyr::left_join(sampling_area_districts_enhanced, ., by = "state") # calculate the difference between # state and district sampling_area_districts_enhanced <- sampling_area_districts_enhanced %>% mutate(death_diffs = death_rate - death_rate_state) ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/incidence-1.png" style="display: block; margin: auto;" /> ] --- ## Respondents in Districts We have population density and the difference in 7-day incidences on the district level. Since our analysis focuses on the individual-level, we can spatial join the information to our fake respondents' coordinates. ```r # join back spatial_information <- sampling_area_districts_enhanced %>% # keeping just the variables we want dplyr::select(district_id, pop_dens, death_diffs) %>% # since we want to join district to # respondent defining coordintes first sf::st_join(fake_coordinates, # district data second . , # some points may lie on the border # choosing intersects therefore join = st_intersects) %>% # drop our coordinates for data protection sf::st_drop_geometry() ``` --- ## Respondents in Districts ```r head(spatial_information, 5) ``` ``` ## id_2 district_id pop_dens death_diffs ## 1 O7fTi57jEz 05758 555.9444 -0.08027408 ## 2 GR3DU6RNwC 05758 555.9444 -0.08027408 ## 3 HIROndK7fZ 08136 208.0870 0.05946759 ## 4 r0II3cKoc6 05758 555.9444 -0.08027408 ## 5 fffsx5Ka5Q 05958 132.1548 -0.12687836 ``` --- ## Distance to Closest Hospital .pull-left[ We're getting a little bit more advanced here. Not in the means of spatial relations or calculations, but concerning data wrangling in general. We got our survey respondents and already matched the population density and the difference of Corona incidences between state and district level. We want to calculate the straight line distance between them and the closest hospital for each of our fake respondents. ] .pull-right[ <img src="data:image/png;base64,#../img/distance.png" width="65%" style="display: block; margin: auto;" /> ] --- # Distance Calculation `sf::st_distance()` will calculate between **all** respondents and **all** hospitals resulting in a matrix with 1,786,000 objects (1,000 respondent * 1,786 hospitals). We can make our lives a little bit easier by treating this matrix as a `tibble` or data frame having 1,000 observations with 1,786 variables. .pull-left[ ```r # distances between each respondent # and each hospital distance_matrix <- # point layer "distance from" sf::st_distance( fake_coordinates, # point layer "distance to" sampling_area_hospitals, # dense matrix with all # pairwise distance by_element = FALSE ) %>% # making life a little bit easier dplyr::as_tibble() # check our matrix # again, units = CRS units! distance_matrix ``` ] .pull-right[ ``` ## # A tibble: 1,000 × 1,786 ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 ## [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] ## 1 287791. 287902. 249823. 248798. 250631. 2.51e5 2.54e5 2.54e5 2.51e5 2.49e5 2.50e5 2.53e5 2.53e5 2.52e5 2.24e5 2.26e5 2.28e5 2.27e5 2.25e5 2.24e5 2.24e5 ## 2 287430. 287548. 248395. 247451. 249297. 2.49e5 2.52e5 2.52e5 2.50e5 2.48e5 2.49e5 2.52e5 2.51e5 2.50e5 2.22e5 2.24e5 2.25e5 2.25e5 2.23e5 2.22e5 2.22e5 ## 3 673225. 673426. 617898. 618634. 620619. 6.21e5 6.24e5 6.23e5 6.21e5 6.19e5 6.20e5 6.22e5 6.21e5 6.21e5 5.71e5 5.73e5 5.75e5 5.73e5 5.72e5 5.71e5 5.71e5 ## 4 297790. 297917. 256840. 256062. 257934. 2.58e5 2.61e5 2.61e5 2.59e5 2.57e5 2.58e5 2.60e5 2.60e5 2.59e5 2.28e5 2.30e5 2.32e5 2.31e5 2.29e5 2.28e5 2.28e5 ## 5 388291. 388389. 350424. 349535. 351388. 3.51e5 3.55e5 3.55e5 3.52e5 3.50e5 3.51e5 3.54e5 3.53e5 3.53e5 3.21e5 3.23e5 3.25e5 3.24e5 3.22e5 3.21e5 3.21e5 ## 6 333300. 333455. 287259. 286952. 288881. 2.89e5 2.92e5 2.92e5 2.89e5 2.88e5 2.88e5 2.91e5 2.90e5 2.90e5 2.52e5 2.54e5 2.56e5 2.55e5 2.53e5 2.52e5 2.52e5 ## 7 345534. 345633. 308252. 307270. 309109. 3.09e5 3.12e5 3.12e5 3.10e5 3.08e5 3.09e5 3.11e5 3.11e5 3.10e5 2.81e5 2.83e5 2.85e5 2.84e5 2.82e5 2.81e5 2.81e5 ## 8 476950. 476996. 446288. 444964. 446737. 4.47e5 4.50e5 4.50e5 4.47e5 4.46e5 4.46e5 4.49e5 4.49e5 4.48e5 4.22e5 4.23e5 4.25e5 4.24e5 4.22e5 4.21e5 4.21e5 ## 9 461612. 461655. 431762. 430372. 432132. 4.32e5 4.35e5 4.35e5 4.33e5 4.31e5 4.32e5 4.35e5 4.34e5 4.33e5 4.08e5 4.10e5 4.12e5 4.11e5 4.09e5 4.08e5 4.08e5 ## 10 462088. 462136. 431349. 430020. 431792. 4.32e5 4.35e5 4.35e5 4.32e5 4.31e5 4.32e5 4.34e5 4.34e5 4.33e5 4.07e5 4.09e5 4.10e5 4.10e5 4.08e5 4.07e5 4.07e5 ## # … with 990 more rows, and 1,765 more variables: V22 [m], V23 [m], V24 [m], V25 [m], V26 [m], V27 [m], V28 [m], V29 [m], V30 [m], V31 [m], V32 [m], ## # V33 [m], V34 [m], V35 [m], V36 [m], V37 [m], V38 [m], V39 [m], V40 [m], V41 [m], V42 [m], V43 [m], V44 [m], V45 [m], V46 [m], V47 [m], V48 [m], ## # V49 [m], V50 [m], V51 [m], V52 [m], V53 [m], V54 [m], V55 [m], V56 [m], V57 [m], V58 [m], V59 [m], V60 [m], V61 [m], V62 [m], V63 [m], V64 [m], ## # V65 [m], V66 [m], V67 [m], V68 [m], V69 [m], V70 [m], V71 [m], V72 [m], V73 [m], V74 [m], V75 [m], V76 [m], V77 [m], V78 [m], V79 [m], V80 [m], ## # V81 [m], V82 [m], V83 [m], V84 [m], V85 [m], V86 [m], V87 [m], V88 [m], V89 [m], V90 [m], V91 [m], V92 [m], V93 [m], V94 [m], V95 [m], V96 [m], ## # V97 [m], V98 [m], V99 [m], V100 [m], V101 [m], V102 [m], V103 [m], V104 [m], V105 [m], V106 [m], V107 [m], V108 [m], V109 [m], V110 [m], V111 [m], ## # V112 [m], V113 [m], V114 [m], V115 [m], V116 [m], V117 [m], V118 [m], V119 [m], V120 [m], V121 [m], … ``` ] --- ## Find Minimum Distance That's all there is concerning the "spatial" part of our data wrangling. From now on, just good old (boring) data crunching to get our distance to the closest hospital. .pull-left[ ```r distance_closest <- distance_matrix %>% # from unit to numeric dplyr::mutate_all(as.numeric) %>% # identify for each row the minimum # & save in variable dplyr::mutate(dist_closest_hospital = (apply(., 1, min))) %>% # select only column # containing smallest distance dplyr::select(dist_closest_hospital) ``` ] .pull-right[ ``` ## # A tibble: 1,000 × 1 ## dist_closest_hospital ## <dbl> ## 1 7177. ## 2 5119. ## 3 10660. ## 4 6080. ## 5 10661. ## 6 17613. ## 7 7589. ## 8 5165. ## 9 6401. ## 10 7724. ## # … with 990 more rows ``` ] --- ## Get all the information together! Again, I prefer to work with kilometers rather than meters. And I want to add our new variable to the other spatial information we already prepared. Luckily, I know that the spatial information table has the same length and order as the fake coordinates. Otherwise, it would be wise to bind via the original coordinate data frame we used for `sf::st_distance()`. .pull-left[ ```r spatial_information <- distance_closest %>% # get kilometer dplyr::mutate(dist_closest_hospital = dist_closest_hospital/1000 ) %>% # bind columns with spatial information # only bind with other data set than the # original coordinates when you are 100 # percent sure it's same length and order! dplyr::bind_cols(spatial_information, .) ``` ] .pull-right[ ``` ## id_2 district_id pop_dens death_diffs dist_closest_hospital ## 1 O7fTi57jEz 05758 555.94443 -0.0802740755 7.1769192 ## 2 GR3DU6RNwC 05758 555.94443 -0.0802740755 5.1187058 ## 3 HIROndK7fZ 08136 208.08695 0.0594675932 10.6603226 ## 4 r0II3cKoc6 05758 555.94443 -0.0802740755 6.0797974 ## 5 fffsx5Ka5Q 05958 132.15479 -0.1268783629 10.6610723 ## 6 q46L4y6sHb 05762 116.36963 0.0448750538 17.6126863 ## 7 vEMkSbk41Y 05754 376.26479 -0.0900035581 7.5894187 ## 8 QbFp2Km7CN 05382 520.98731 0.0181760619 5.1651725 ## 9 xPovR3CEG5 05382 520.98731 0.0181760619 6.4013152 ## 10 YFL4QpbWjx 05382 520.98731 0.0181760619 7.7236223 ## 11 5Yso5RnugO 05162 783.82839 -0.0405731509 4.9733343 ## 12 M547iBtNyh 08425 146.01760 -0.1069676028 4.5428958 ## 13 w5d6fAeWTm 05124 2107.60149 0.0814873748 3.4912852 ## 14 WxFnIGdqcD 08437 108.70002 -0.1206711475 3.6645189 ## 15 yMMAdqn7lL 05366 155.45360 0.0481721321 4.2989037 ## 16 g5wcDKWuCz 08127 133.33031 0.0483465435 14.3985737 ## 17 jLtHcdZJxt 08215 412.04059 0.0096710668 8.9447827 ## 18 i9e4AgqHU9 08115 636.57455 -0.1033003928 5.3878743 ## 19 q3eKUrwWbK 08236 348.61600 0.0368877917 0.9011041 ## 20 Yl8PcrMvh3 08326 207.49459 0.0229200627 8.0975778 ## 21 dSoRjVOE7u 08436 175.19027 -0.2306501247 12.6256103 ## 22 ILixjNNfI5 08237 135.91128 0.1117571667 8.5548379 ## 23 Q8Gt7HL1Hm 05958 132.15479 -0.1268783629 14.8147212 ## 24 ij7sGLQ7oG 08128 101.68632 -0.0143387664 8.5097311 ## 25 tjMWkLxa8L 05170 441.42773 -0.1175255084 1.3915677 ## 26 TnWWR2gV3M 05954 786.40592 0.1635924196 1.1605018 ## 27 4A8iTh6QQh 08116 834.10623 -0.0045280880 0.6257304 ## 28 RjSCbEu2Vc 08235 200.80118 0.0684648164 13.6047285 ## 29 sjFjDL4MZI 05116 1521.83571 -0.0312585218 3.0356690 ## 30 vqw8tYzkkz 08136 208.08695 0.0594675932 6.7889479 ## 31 TVfzqhKIaE 08436 175.19027 -0.2306501247 8.7618815 ## 32 3Uq1ZMs5QB 08216 314.15154 0.0031697810 3.0404315 ## 33 AfNswvlgPy 08416 439.44729 -0.1206246543 2.2502235 ## 34 QDmoFP8X7R 05754 376.26479 -0.0900035581 6.8249624 ## 35 vYLTl2ga8w 08136 208.08695 0.0594675932 9.3466382 ## 36 6TFTwHjI4G 05566 249.62364 -0.1455078327 3.7749312 ## 37 hUzBKNXfCZ 08425 146.01760 -0.1069676028 7.5857917 ## 38 03mdYtCgXf 05166 529.70024 -0.0287075384 2.7771277 ## 39 b3Hk0t2gPr 05334 790.16586 0.0359856571 0.9483628 ## 40 XKIq3yQUG9 05513 2460.64463 0.2967109665 2.7474021 ## 41 4VqqaM5jtu 05566 249.62364 -0.1455078327 8.8303838 ## 42 TUfS6aBAJW 05358 281.92118 0.0121631341 8.6313366 ## 43 5Ns4ElbtSz 08436 175.19027 -0.2306501247 4.4271156 ## 44 kpdCocyxRk 08136 208.08695 0.0594675932 3.3811249 ## 45 720gcmF7wL 05754 376.26479 -0.0900035581 12.1317325 ## 46 0czRW8CI3y 05358 281.92118 0.0121631341 4.5362033 ## 47 n8qH3Al3ic 05915 785.90473 0.0081990340 1.6337306 ## 48 xyXh9oCMDQ 08416 439.44729 -0.1206246543 7.8224508 ## 49 59lwaKLnNJ 05566 249.62364 -0.1455078327 8.1967527 ## 50 w2tjGrLdKo 08125 315.27257 -0.1730679985 13.9048089 ## 51 GjRqWOT11e 08325 182.15285 0.0702995398 3.9516944 ## 52 15JyELj7ZA 05154 254.23646 -0.0258392971 3.8286432 ## 53 JnYEbhRS95 08127 133.33031 0.0483465435 9.3547712 ## 54 f3gxkmeLQ1 05362 668.01880 -0.0898995794 0.9391593 ## 55 IVc9JffkNB 08315 192.26363 -0.0687592509 10.3015608 ## 56 z281LoGalB 05962 385.28014 0.0071737291 3.0696118 ## 57 nhhIRJHvos 08415 263.21878 0.0004831449 10.0949932 ## 58 ML4KcF7yWH 08435 327.56409 0.0021880157 13.7477861 ## 59 iiHHR7JyHJ 05774 247.29856 -0.1955966928 12.2749802 ## 60 76k4DMP45N 05966 187.32643 0.0768122765 4.3314970 ## 61 V7yvuXm4Hy 05566 249.62364 -0.1455078327 1.8001089 ## 62 lvTmcOJjfg 08125 315.27257 -0.1730679985 2.1538365 ## 63 8dmcLTjbha 08336 283.51661 0.1201763290 9.5131615 ## 64 VKLrCk41J4 05154 254.23646 -0.0258392971 5.4475335 ## 65 HIexK5hPlk 05913 2101.41275 -0.1527398270 0.4997071 ## 66 UFcOYNRd5q 08226 516.79121 -0.0478656131 1.4622856 ## 67 WsWSMzi5ze 08425 146.01760 -0.1069676028 7.4922686 ## 68 sh7hqR4Mcu 05766 278.50287 0.1156888380 7.7892734 ## 69 hCFbC7GqFc 08128 101.68632 -0.0143387664 8.1348111 ## 70 gHZzGCtOwl 05762 116.36963 0.0448750538 10.9278658 ## 71 eNQIdNSMcB 08417 206.97038 -0.0900173449 4.2638120 ## 72 p6BrrTljsD 05754 376.26479 -0.0900035581 5.8957855 ## 73 nzcQz56dRW 08326 207.49459 0.0229200627 10.0613726 ## 74 Dkm5QTJAp9 08337 151.42175 0.1192169854 7.1484623 ## 75 Kjaej1ByRh 05770 269.32985 -0.0633469583 15.5539863 ## 76 cESKq5A92M 08125 315.27257 -0.1730679985 5.1950324 ## 77 MJb0C1ntlw 05566 249.62364 -0.1455078327 4.2111722 ## 78 ZDoEFh1aTI 05558 198.42971 -0.2688506462 4.3143844 ## 79 y9iP0O6Umw 05558 198.42971 -0.2688506462 7.1715847 ## 80 udztRUoW3L 05958 132.15479 -0.1268783629 16.9414143 ## 81 tYKrEYIYoZ 05966 187.32643 0.0768122765 16.3213389 ## 82 zoVxlfThxB 05770 269.32985 -0.0633469583 16.9334244 ## 83 v0RaBBmUss 08435 327.56409 0.0021880157 5.7394218 ## 84 QfHtu8PqVJ 05911 2516.36577 -0.0132357586 3.9081223 ## 85 M78fbvuTib 05119 2704.54500 0.3976744385 6.3139059 ## 86 1aOtERytqO 05774 247.29856 -0.1955966928 9.2289184 ## 87 SEuAXh5TKv 08117 402.67820 -0.0018583774 5.5312932 ## 88 fLboeZ2cAQ 08337 151.42175 0.1192169854 6.3590251 ## 89 PHBYrB7Z0w 05370 408.82725 0.3059996213 5.5006891 ## 90 wqzivihhEB 08327 193.01362 0.0463484407 10.5397453 ## 91 c5tYqZfwo8 08436 175.19027 -0.2306501247 2.2968151 ## 92 NsmRK2LH5B 05166 529.70024 -0.0287075384 2.0910968 ## 93 gzXmCFKtbV 05154 254.23646 -0.0258392971 6.2214363 ## 94 5RNEkSWaRj 08116 834.10623 -0.0045280880 3.8910856 ## 95 OMiDIljHbi 05334 790.16586 0.0359856571 6.1887027 ## 96 hxefs2RWDi 08435 327.56409 0.0021880157 8.8376278 ## 97 918PuIkBnE 05962 385.28014 0.0071737291 9.9172968 ## 98 fmIHMQrA5X 05978 725.16111 0.1246544538 2.8218814 ## 99 QrWev484Vv 08117 402.67820 -0.0018583774 4.4173467 ## 100 lTUzPRIgBP 05378 646.60085 -0.1647324428 8.4575022 ## 101 BnV2FTfHb5 08317 232.59748 0.0665610848 5.1952035 ## 102 mQj70lFDey 08116 834.10623 -0.0045280880 2.3024657 ## 103 k4AR1IemmA 08225 127.72837 -0.0434328301 4.7535319 ## 104 qkSGULBLnK 05154 254.23646 -0.0258392971 7.3747300 ## 105 QBV1IrQFIE 05382 520.98731 0.0181760619 7.6896369 ## 106 eLtuhFMOH4 05754 376.26479 -0.0900035581 12.5654488 ## 107 QD1T8zrZD4 08317 232.59748 0.0665610848 4.6987275 ## 108 sG2fAbgduS 05762 116.36963 0.0448750538 19.3802048 ## 109 WKMr5VpGYt 08336 283.51661 0.1201763290 8.2247506 ## 110 L0CL9HhTSg 05366 155.45360 0.0481721321 1.4173391 ## 111 QtIPVZDhpB 05962 385.28014 0.0071737291 9.4086334 ## 112 UOsDIh1aqd 08336 283.51661 0.1201763290 0.9728203 ## 113 wqETvNjB0K 05374 295.89374 0.0241034281 3.1561608 ## 114 Q28RqEZgKa 08317 232.59748 0.0665610848 8.5830843 ## 115 LPJd2in5TC 05766 278.50287 0.1156888380 6.9948860 ## 116 vdHrfJ9yyt 08125 315.27257 -0.1730679985 1.7568195 ## 117 KRMlkJVPXp 08136 208.08695 0.0594675932 9.2101894 ## 118 tBRAdDjuIc 05334 790.16586 0.0359856571 1.1693557 ## 119 a9i7MxThm8 08335 351.04546 -0.0050070137 3.4061744 ## 120 FUGL7cdw66 05762 116.36963 0.0448750538 3.3444580 ## 121 Y7yI7MiJbK 08235 200.80118 0.0684648164 14.6907684 ## 122 EbPbrUvyFy 08125 315.27257 -0.1730679985 7.5519700 ## 123 vO5mufgmLa 08317 232.59748 0.0665610848 7.7042075 ## 124 T0M823AdgM 08436 175.19027 -0.2306501247 10.4517857 ## 125 CQG7V6J2nB 05958 132.15479 -0.1268783629 9.9436876 ## 126 Uukl9d3gNm 08127 133.33031 0.0483465435 12.7117475 ## 127 5bWAOPlIi6 08421 1060.26770 -0.1866207154 8.9441418 ## 128 TgDlsOqy9U 08337 151.42175 0.1192169854 4.1594269 ## 129 aqkexBt2hM 08336 283.51661 0.1201763290 12.2737898 ## 130 RYZvuNoYDw 08315 192.26363 -0.0687592509 3.3260611 ## 131 dsn83ArNC6 08417 206.97038 -0.0900173449 3.7295990 ## 132 l5FIs6YoCy 05315 2663.66608 -0.1617247369 1.1657705 ## 133 OlKaexqes0 05358 281.92118 0.0121631341 5.3138192 ## 134 2EvAnMCjfJ 05762 116.36963 0.0448750538 13.5918381 ## 135 73RTkvqSkB 08426 143.45161 -0.0776801790 9.5404686 ## 136 wZ8MjLy6lu 05774 247.29856 -0.1955966928 13.5231612 ## 137 SDkNqa4Dwy 08326 207.49459 0.0229200627 5.3779418 ## 138 NvTuj4xcF3 05566 249.62364 -0.1455078327 7.6271905 ## 139 J7jIc27oJk <NA> NA NA 9.1724279 ## 140 sdo7WEwJ6K 05970 242.93087 -0.1534861414 10.0698918 ## 141 Y1v77mcfID 05111 2853.61446 0.0018459746 0.3594995 ## 142 FgRX1nkVzG 05766 278.50287 0.1156888380 11.2889831 ## 143 hzrm3hbPWB 05962 385.28014 0.0071737291 6.2940172 ## 144 VxM3OsOZk2 08117 402.67820 -0.0018583774 13.3266676 ## 145 c5Nne81No8 08117 402.67820 -0.0018583774 12.7618121 ## 146 Kg1sMgzHNT 08317 232.59748 0.0665610848 3.7538745 ## 147 qd6L4MdMQ7 08215 412.04059 0.0096710668 4.8623859 ## 148 YbKhgmhcgK 05566 249.62364 -0.1455078327 7.8514655 ## 149 RwMWpsfWUq 08226 516.79121 -0.0478656131 8.8318219 ## 150 9CSUu9RYAi 05374 295.89374 0.0241034281 12.2868468 ## 151 9LHHB7aHKf 05566 249.62364 -0.1455078327 4.0367246 ## 152 F4L5wWpFV5 08226 516.79121 -0.0478656131 9.4243638 ## 153 dHanLCqkPs 05554 261.65165 -0.1496587346 6.4894109 ## 154 JcQFgQJsft 05770 269.32985 -0.0633469583 4.9788130 ## 155 1IUPL6Al58 05978 725.16111 0.1246544538 4.8752218 ## 156 dsxQdzHkcz 08436 175.19027 -0.2306501247 4.2416221 ## 157 hMfiwAByhG 08237 135.91128 0.1117571667 6.7739425 ## 158 zVUemE7njg 08222 2134.32358 0.0106373029 2.4518424 ## 159 uq9TZQliVD 05766 278.50287 0.1156888380 9.5419829 ## 160 DJPfgldy5V 08435 327.56409 0.0021880157 3.0370179 ## 161 YY3PT4itcG 05122 1788.76737 0.0644885912 0.7414913 ## 162 mnSKbzEJi5 05958 132.15479 -0.1268783629 5.6040007 ## 163 Tw6KaqugsI 08326 207.49459 0.0229200627 9.1088834 ## 164 CszoAqWtOa 05754 376.26479 -0.0900035581 5.2575322 ## 165 0MLELq0a2N 05774 247.29856 -0.1955966928 7.4105020 ## 166 Q7qJjcP6UO 08136 208.08695 0.0594675932 4.3154142 ## 167 6aoY3IBQn3 05762 116.36963 0.0448750538 8.1509789 ## 168 t5OcE1i2SX 08237 135.91128 0.1117571667 1.9753395 ## 169 Bu47sWZ95i 08315 192.26363 -0.0687592509 6.3856617 ## 170 u5mewoG6Ta 08119 498.38193 -0.0395572763 7.3830039 ## 171 H2MCgBo58p 08231 1282.46450 0.2530660691 1.6795341 ## 172 LpKqif65dW 05774 247.29856 -0.1955966928 13.7915798 ## 173 brmgzpVpGN 05754 376.26479 -0.0900035581 8.9290701 ## 174 hLHw63lxOj 08231 1282.46450 0.2530660691 1.4666768 ## 175 Ay7sbbONtS 05774 247.29856 -0.1955966928 8.9877009 ## 176 jPO7dtpycU 08425 146.01760 -0.1069676028 10.9914236 ## 177 wFXdskQLBC 08115 636.57455 -0.1033003928 4.2425936 ## 178 mhXMx1UoaN 08436 175.19027 -0.2306501247 11.2780310 ## 179 mrceF8L1p0 08119 498.38193 -0.0395572763 2.6276322 ## 180 eM34tQbKEj 08126 145.10239 -0.0107632803 19.5265865 ## 181 a9rTREVr7a 08336 283.51661 0.1201763290 2.0119247 ## 182 FYouyvAlYA 05358 281.92118 0.0121631341 5.9350272 ## 183 tmDnYssZ8p 08235 200.80118 0.0684648164 11.5431368 ## 184 RlWw4z4Aal 05774 247.29856 -0.1955966928 10.6843399 ## 185 uAXKj3BsMz 05554 261.65165 -0.1496587346 8.6082976 ## 186 RYtPyiXyrL 08135 211.68600 0.1612797011 6.2065619 ## 187 Bg134QEfOG 05754 376.26479 -0.0900035581 5.2467802 ## 188 rQAGQVY5P1 05554 261.65165 -0.1496587346 6.2315338 ## 189 khk2y7TEOM 08316 245.24450 -0.0292549563 7.8921012 ## 190 KKL2kkRUsR 08115 636.57455 -0.1033003928 6.3038816 ## 191 taLHx0TkVN <NA> NA NA 6.3294839 ## 192 wO4RMPqwwY 05570 210.39111 -0.0872441520 6.2193234 ## 193 cMGzbattzV 08327 193.01362 0.0463484407 3.1805612 ## 194 JNz5FC4TAw 08116 834.10623 -0.0045280880 4.4610096 ## 195 sbL9nCmGpU 09575 79.86946 -0.0456319148 9.4660969 ## 196 e9ZNlahuYA 08425 146.01760 -0.1069676028 11.9815884 ## 197 VujtUgryy2 08216 314.15154 0.0031697810 6.0583587 ## 198 UMGUve8RAo 08226 516.79121 -0.0478656131 6.6430202 ## 199 TblyqAb4iv 08126 145.10239 -0.0107632803 12.2779588 ## 200 BXYGMJqfuB 08425 146.01760 -0.1069676028 9.5190939 ## [ reached 'max' / getOption("max.print") -- omitted 800 rows ] ``` ] --- ## Immigrant Rate Buffers ...and we're not yet done: we still need the immigrant rate in the neighborhood. Let's calculate buffers of 500 meters and add their mean values to our dataset. ```r # download data & create rate immigrants <- z11::z11_get_100m_attribute(STAATSANGE_KURZ_2) inhabitants <- z11::z11_get_100m_attribute(Einwohner) %>% terra::resample(immigrants) immigrant_rate <- immigrants / inhabitants * 100 # set missings immigrant_rate[is.na(immigrant_rate)] <- 0 immigrant_buffers <- terra::extract( immigrant_rate, terra::vect(fake_coordinates), buffer = 500, fun = mean ) # spatially link with buffers on the fly spatial_information <- spatial_information %>% dplyr::mutate(immigrant_buffers = immigrant_buffers[[2]]) ``` --- ## Join with Fake Burden Score I hope you're not tired to join data tables. Since we care a tiny bit more about data protection than others, we have yet another joining task left: joining the information we received using our (protected) fake coordinates to the actual survey data via the correspondence table. .pull-left[ ```r # last joins for now fake_survey_data_spatial <- # first join the id dplyr::left_join( correspondence_table, spatial_information, by = "id_2" ) %>% # drop the fake_coordinate id dplyr::select(-id_2) %>% # join the survey data dplyr::left_join( fake_survey_data, by = "id" ) ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/correlation-plot-1.png" width="75%" style="display: block; margin: auto;" /> ] --- ## ... and when things get complicated? In these presented examples, things are working out quite well. However, you will experience that sometimes things can get a little more complicated. Challenges like not matching geometries cause problems for simple spatial joins and force you to decide how to crop the information. For example, ZIP codes and electoral districts do not match with the administrative districts in Germany. You might also want to advance your measurements and dive deeper into the measurements like densities, distances to lines or polygons, spatial autocorrelation measurements, and neighborhood matrices. All of this is possible using *R/RStudio*! --- class: middle ## Exercise 2_2_1 (optional!): Spatial Joins [Exercise](https://stefanjuenger.github.io/gis_socsci_konstanz/exercises/2_2_1_Spatial_Joins_OPTIONAL.html) [Solution](https://stefanjuenger.github.io/gis_socsci_konstanz/exercises/2_2_1_Spatial_Joins_OPTIONAL.html) <!-- STOP WITH SLIDES HERE --> --- layout: false class: center background-image: url(data:image/png;base64,#../assets/img/the_end.png) background-size: cover .left-column[ </br> <img src="data:image/png;base64,#../img/Stefan.png" width="75%" style="display: block; margin: auto;" /> ] .right-column[ .left[.small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path> </svg> [stefan.juenger@gesis.org](mailto:stefan.juenger@gesis.org)] .small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path> </svg> [`@StefanJuenger`](https://twitter.com/StefanJuenger)] .small[<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path> </svg> [`StefanJuenger`](https://github.com/StefanJuenger)] .small[<svg viewBox="0 0 576 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"></path> </svg> [`https://stefanjuenger.github.io`](https://stefanjuenger.github.io)]] ]